An Evaluation of a Lexicographer's Workbench: building lexicons For Machine Translation

نویسندگان

  • Rob Koeling
  • Adam Kilgarriff
  • David Tugwell
  • Roger Evans
چکیده

NLP system developers and corpus lexicographers would both benefit from a tool for finding and organizing the distinctive patterns of use of words in texts. Such a tool would be an asset for both language research and lexicon development, particularly for lexicons for Machine Translation (MT). We have developed the WASPBENCH, a tool that (1) presents a "word sketch", a summary of the corpus evidence for a word, to the lexicographer; (2) supports the lexicographer in analysing the word into its distinct meanings and (3) uses the lexicographer's analysis as the input to a state-of-the-art word sense disambiguation algorithm, the output of which is a "word expert" for the word which can then disambiguate new instances of the word. In this paper we describe a set of evaluation experiments, designed to establish whether WASPBENCH can be used to save time and improve performance in the development of a lexicon for Machine Translation or other NLP application.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Evaluation of a Lexicographer's Workbench Incorporating Word Sense Disambiguation

NLP system developers and corpus lexicographers would both bene t from a tool for nding and organizing the distinctive patterns of use of words in texts Such a tool would be an asset for both language research and lexicon development particularly for lexicons for Machine Translation We have developed the waspbench a tool that presents a word sketch a summary of the corpus evidence for a word to...

متن کامل

WASPBENCH: a lexicographer's workbench incorporating state-of-the-art word sense disambiguation

Human Language Technologies (HLT) need dictionaries, to tell them what words mean and how they behave. People making dictionaries (lexicographers) need HLT, to help them identify how words behave so they can make better dictionaries. Thus a potential for synergy exists across the range of lexical data in the construction of headword lists, for spelling correction, phonetics, morphology and synt...

متن کامل

Lost in Translations? Building Sentiment Lexicons using Context Based Machine Translation

In this paper, we propose a simple yet efective approach to automatically building sentiment lexicons from English sentiment lexicons using publicly available online machine translation services. The method does not rely on any semantic resources or bilingual dictionaries, and can be applied to many languages. We propose to overcome the low coverage problem through putting each English sentimen...

متن کامل

Building a Bilingual Lexicon Using Phrase-based Statistical Machine Translation via a Pivot Language

This paper proposes a novel method for building a bilingual lexicon through a pivot language by using phrase-based statistical machine translation (SMT). Given two bilingual lexicons between language pairs Lf–Lp and Lp–Le, we assume these lexicons as parallel corpora. Then, we merge the extracted two phrase tables into one phrase table between Lf and Le. Finally, we construct a phrase-based SMT...

متن کامل

Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation

We present new direct data analysis showing that dynamically-built context-dependent phrasal translation lexicons are more useful resources for phrase-based statistical machine translation (SMT) than conventional static phrasal translation lexicons, which ignore all contextual information. After several years of surprising negative results, recent work suggests that context-dependent phrasal tr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003